Remote Sensing · Time Series Analysis · Forest Health Monitoring · South America

Monitoring Eucalyptus Plantation Health Over Two Decades

Generalized Additive Models · Landsat Time Series · Spectral Index Gap-Filling · R
Joselyne MPAYIMANA  |  University of British Columbia  |  2024
R mgcv GAM Landsat NDVI NDMI EVI Time Series Gap Filling Forest Health Eucalyptus REML

Overview

Commercial Eucalyptus plantations are a critical component of the forestry sector across South America, providing timber, pulpwood, and carbon sequestration services. Monitoring their health over time using satellite imagery is essential for detecting disturbances early, planning harvests, and assessing long-term stand productivity. However, Landsat time series data are rarely complete: cloud cover, sensor gaps, and acquisition scheduling leave substantial periods without observations.

This project applies Generalized Additive Models (GAMs) to a 20-year Landsat multispectral time series from a Eucalyptus stand in South America to characterize vegetation dynamics, fill temporal gaps in spectral index data, and distinguish seasonal variation from long-term trends. The dataset spans 2000 to 2019 and captures a striking disturbance event around 2002 to 2003 where NDVI dropped sharply from values near 0.90 to below 0.35, followed by a multi-year recovery trajectory that GAMs are well-positioned to describe.

20
Years of Landsat observations (2000–2019)
189
Valid NDVI observations out of 395 scenes
76.7%
Deviance explained by best GAM model
3
Spectral indices modelled: NDVI, NDMI, EVI

The Challenge: Incomplete Satellite Time Series

Satellite remote sensing produces time series data that are inherently gappy. Cloud cover is the most common cause of missing observations, but sensor malfunctions, orbital gaps, and radiometric processing failures also contribute. In the dataset used here, only 189 of 395 scheduled Landsat scenes contain valid spectral reflectance values for the stand of interest, meaning roughly 52% of observations are missing.

Simple interpolation approaches such as linear interpolation or moving averages struggle with data of this kind because they cannot simultaneously account for long-term trends and within-year seasonality. A model that treats all gaps as equivalent, regardless of whether they fall in a high-growth summer period or a dormant winter period, will produce systematically biased estimates. GAMs address this limitation by explicitly modelling both the temporal trend and the seasonal cycle as separate smooth components, allowing gap-filling predictions to be ecologically informed.


Data

The dataset contains multispectral reflectance values from the Landsat satellite mission for a single Eucalyptus stand (ID: 32041-1-5) in South America. Each observation includes surface reflectance in five Landsat bands and three derived spectral indices: NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), and EVI (Enhanced Vegetation Index). Time is represented as a continuous decimal year value (Year_Continuous) to allow use in the smooth functions of the GAM.

Interactive Time Series Explorer

The chart below shows all valid observations for the three spectral indices across the 20-year study period. Use the buttons to switch between indices. The dramatic drop visible around 2002 to 2003 reflects a disturbance event followed by a gradual recovery of canopy greenness and moisture content over subsequent years.

Landsat Spectral Index Time Series — Eucalyptus Stand, South America (2000–2019)
Figure 1. NDVI time series (2000–2019) for a Eucalyptus stand in South America derived from Landsat multispectral imagery. Each point represents a cloud-free observation. The sharp decline around 2002–2003 reflects a stand-level disturbance event followed by gradual canopy recovery.

Modelling Approach: Generalized Additive Models

GAMs extend linear models by replacing fixed-slope terms with flexible smooth functions, making them well-suited for ecological time series where the relationship between time and vegetation response is nonlinear. The smoothing effect comes from two components: the smooth function s(), which uses spline basis functions to flexibly fit the data, and a penalty term that controls how wiggly the fitted curve can become. The degree of smoothness is automatically tuned using Restricted Maximum Likelihood (REML), preventing the model from simply connecting every observed point.

Four models of increasing complexity were fitted to characterize different aspects of the vegetation signal. Each builds on the previous by adding components that capture additional sources of variation.

reg_01b — Temporal Trend Only

A GAM with a single Gaussian process smooth on the continuous year variable. Captures the long-term trend in NDVI without accounting for within-year seasonality.

Basis: Gaussian process (gp) | Method: REML
reg_02 — Seasonal Cycle Only

A cyclic cubic spline on day-of-year, fitted only to post-2008 data where the stand has recovered. Isolates the within-year seasonal pattern of canopy greenness.

Basis: Cyclic cubic (cc) | Data: 2008 onward
reg_03 — Trend + Seasonality

Combines a Gaussian process smooth on year with a cyclic cubic spline on day-of-year. This additive structure separates the long-term recovery trend from seasonal fluctuations.

Deviance explained: 76.7% | R² = 0.782
reg_04 — Trend × Season Interaction

Adds a tensor interaction term between year and day-of-year, allowing the seasonal amplitude to shift over time. Applied to NDMI to test whether moisture seasonality changed across the study period.

Basis: Tensor product (ti) | Includes interaction

Key Model Code

Best performing model — trend and seasonality combined
reg_03 <- gam(
  ndvi ~ s(Year, bs = "gp") + s(doy, bs = "cc", k = 20),
  data   = subset(my_sdata, Year >= 2004),
  method = "REML"
)
summary(reg_03)
Tensor interaction model for NDMI
reg_04 <- gam(
  ndmi ~ s(Year) + s(doy, bs = "cc") + ti(doy, Year),
  data   = my_sdata,
  method = "REML",
  select = TRUE
)
Gap-filling: generating predictions for all days 2003–2020
new_data            <- expand.grid(Year = 2003:2020, doy = 1:365)
new_data$Year_Continuous <- new_data$Year + new_data$doy / 365
pred_gam            <- predict(reg_03, newdata = new_data, se.fit = TRUE)

Results

The combined trend-plus-seasonality model (reg_03) explained 76.7% of the deviance in NDVI with an adjusted R² of 0.782, substantially outperforming the trend-only model. The Gaussian process smooth on year captured the sharp decline and multi-year recovery following the 2003 disturbance event, while the cyclic cubic spline on day-of-year captured the within-year seasonal pulse of photosynthetic activity characteristic of Eucalyptus stands.

The NDMI models showed an even better fit than NDVI, likely because moisture-sensitive spectral indices exhibit less high-frequency noise than NDVI in densely vegetated Eucalyptus canopies. The tensor interaction model for NDMI confirmed a statistically significant interaction between year and day-of-year, meaning the amplitude of seasonal moisture variation shifted over the course of the study period, an ecologically meaningful finding for plantation managers tracking drought stress.

Model Performance Summary

ModelIndexComponentsDeviance ExplainedAdj. R²
reg_01bNDVITemporal trend only~55%~0.54
reg_02NDVISeasonal cycle only (post-2008)~40%~0.38
reg_03NDVITrend + seasonality76.7%0.782
reg_03bNDMITrend + seasonality>78%>0.78
reg_04NDMITrend + seasonality + interactionSignificant interaction termN/A

Reflections on Model Choice and Limitations

The GAMs fitted here appear to slightly underfit in periods of high seasonal variation, where confidence intervals widen and the model struggles to follow rapid transitions. This is a deliberate trade-off: the smoothing penalty prevents overfitting to noise while accepting some bias in high-variability periods. If overfitting were a concern, reducing the number of knots k or increasing the smoothing penalty would constrain the curve further.

One important limitation of these models is that they are entirely aspatial. They describe temporal dynamics at a single stand without accounting for spatial autocorrelation among neighboring stands. In a landscape where adjacent Eucalyptus plots share similar soil conditions, microclimates, and management histories, ignoring spatial structure may underestimate uncertainty and overstate the generalizability of the fitted trends. Two natural extensions would be spatial lag or error models, which explicitly incorporate a spatial weights matrix, or Geographically Weighted Regression, which allows model coefficients to vary across the landscape. Both approaches have been applied to satellite-derived spectral indices in forestry contexts and would be straightforward extensions of this framework.

A parametric alternative to GAMs would be a linear mixed-effects model with polynomial terms for year and day-of-year. Mixed-effects models are more interpretable since each coefficient directly describes the direction and magnitude of a predictor's effect, but they require assumptions about the functional form of the temporal trend that GAMs do not. A non-parametric alternative such as Random Forest can capture more complex nonlinear relationships without any functional assumptions, but at the cost of interpretability: understanding how a variable drives predictions requires additional tools such as SHAP values rather than simply reading off model coefficients.


Applications and Broader Relevance

The gap-filling framework demonstrated here has direct operational relevance for plantation forestry management. Most commercial Eucalyptus operations use satellite imagery to track stand health, plan fertilization and irrigation schedules, and detect early signs of pest or disease outbreaks. When cloud cover interrupts the time series during a critical growth window, managers lose the ability to make informed decisions. GAMs trained on historical data from the same stand can fill those gaps with uncertainty-quantified predictions rather than simply flagging the period as missing.

Beyond gap-filling, this approach is directly applicable to modelling leaf area index and above-ground biomass as a function of climate variables such as temperature and precipitation, helping predict how stand productivity responds to changing conditions. It is also applicable to early warning systems for drought stress, where NDMI trends can signal moisture deficits before they become visible in NDVI, giving managers an earlier window for intervention.


Joselyne MPAYIMANA  |  Master of Geomatics for Environmental Management, UBC  |  2024